Socioeconomics (also known as social economics) is the social science that studies how economic activity affects and is shaped by social processes. In general it analyzes how societies progress, stagnate, or regress because of their local or regional economy, or the global economy. Various parameters such as GDP per capita, population growth and life expectancy have been a reliable means used to measure the progress of societies.
Here we will be analysing the socioeconomic conditions across five continents from a period of 1952 to 2007. The dataset under investigation is Gapminder dataset for further details please refer to Gapminder. Features present in the analysis and their definitions
| Feature | Definition |
|---|---|
| Country | Name of the country |
| Continent | Name of the continent |
| Year | Year observation was recorded |
| lifeExp | Life expectancy is a statistical measure of the average time an organism is expected to live, based on the year of their birth, their current age and other demographic factors including sex. |
| pop | Population of the country in specific year |
| gdpPercap | GDP - per capita (PPP) compares GDP on a purchasing power parity basis divided by population as of 1 July for the same year. |
Lets have a quick look at the data and we can see all of the six features and their associated values among top five rows.
| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
| Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
| Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
| Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
| Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
In order to have a higher level of understanding of this data having a look at summary table is a crucial step. The key column represents the list of countries present in the data. In rest of the columns respective statistical mean, median and missing values are represented. We can see it is a clean data with zero missing values.
| key | mean | median | missing |
|---|---|---|---|
| continent | 0 | ||
| country | 0 | ||
| gdpPercap | 7215.33 | 3531.85 | 0 |
| lifeExp | 59.47 | 60.71 | 0 |
| pop | 29601212.32 | 7023595.50 | 0 |
| year | 1979.50 | 1979.50 | 0 |
The range of data can be analysed in all three features namely lifeExp, pop and gdpPercap
| Â | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|---|
| Africa | 23.6 | 42.37 | 47.79 | 48.87 | 54.41 | 76.44 |
| Americas | 37.58 | 58.41 | 67.05 | 64.66 | 71.7 | 80.65 |
| Asia | 28.8 | 51.43 | 61.79 | 60.06 | 69.51 | 82.6 |
| Europe | 43.58 | 69.57 | 72.24 | 71.9 | 75.45 | 81.76 |
| Oceania | 69.12 | 71.2 | 73.66 | 74.33 | 77.55 | 81.24 |
| Â | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|---|
| Africa | 60010 | 1342000 | 4579000 | 9916000 | 10800000 | 1.35e+08 |
| Americas | 662800 | 2962000 | 6228000 | 24500000 | 18340000 | 301100000 |
| Asia | 120400 | 3844000 | 14530000 | 77040000 | 46300000 | 1.319e+09 |
| Europe | 148000 | 4332000 | 8551000 | 17170000 | 21800000 | 82400000 |
| Oceania | 1995000 | 3199000 | 6403000 | 8875000 | 14350000 | 20430000 |
| Â | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|---|
| Africa | 241.2 | 761.2 | 1192 | 2194 | 2377 | 21950 |
| Americas | 1202 | 3428 | 5466 | 7136 | 7830 | 42950 |
| Asia | 331 | 1057 | 2647 | 7902 | 8549 | 113500 |
| Europe | 973.5 | 7213 | 12080 | 14470 | 20460 | 49360 |
| Oceania | 10040 | 14140 | 17980 | 18620 | 22210 | 34440 |
Let us fit a linear model and take an overview of the existing relationship among gdpPercap and lifeExp.
| Â | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|
| (Intercept) | -19277 | 914.1 | -21.09 | 6.745e-88 |
| lifeExp | 445.4 | 15.02 | 29.66 | 3.566e-156 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 1704 | 8006 | 0.3407 | 0.3403 |
| Â | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| lifeExp | 1 | 5.638e+10 | 5.638e+10 | 879.6 | 3.566e-156 |
| Residuals | 1702 | 1.091e+11 | 64100173 | NA | NA |
| lifeExp | pop | gdpPercap | |
|---|---|---|---|
| lifeExp | 1.0000000 | 0.0649554 | 0.5837062 |
| pop | 0.0649554 | 1.0000000 | -0.0255996 |
| gdpPercap | 0.5837062 | -0.0255996 | 1.0000000 |
##
## Pearson's product-moment correlation
##
## data: gapminder$pop and gapminder$gdpPercap
## t = -1.0565, df = 1702, p-value = 0.2909
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.07299723 0.02191346
## sample estimates:
## cor
## -0.02559958
## Factor w/ 12 levels "1952","1957",..: 1 2 3 4 5 6 7 8 9 10 ...
Here we can see that we have 142 countries, 5 continents and 12 times the observation was taken from 1952 to 2007. Observations were taken after every five years
Now let’s compare the situations back in 1952 and recently in 2007. We should filter the data first for year 1952. Let’s have a quick look at it before visualising.
## # A tibble: 142 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Albania Europe 1952 55.230 1282697 1601.0561
## 3 Algeria Africa 1952 43.077 9279525 2449.0082
## 4 Angola Africa 1952 30.015 4232095 3520.6103
## 5 Argentina Americas 1952 62.485 17876956 5911.3151
## 6 Australia Oceania 1952 69.120 8691212 10039.5956
## 7 Austria Europe 1952 66.800 6927772 6137.0765
## 8 Bahrain Asia 1952 50.939 120447 9867.0848
## 9 Bangladesh Asia 1952 37.484 46886859 684.2442
## 10 Belgium Europe 1952 68.000 8730405 8343.1051
## # ... with 132 more rows
Now let’s have a look at the visualizations. We are interested in a GDP per capita vs life expectancy to evaluate the performance of various countries and continents. Rich and healthy countries will be present in the top right corner of the plot while the poor and unhealthy countries will be present on the bottom left.
Here we saw that most of the countries were poor and unhealthy especially the countries from Africa and Asia. North American countries were leading in better GDP per capita and higher life expectancy. In next plot we will connect the countries with each other based upon the similar continent.
Here we can see Kuwait had huge variations in their GDP because of hard conditions in the country in 80’s. It faced a stock market crash in that decade followed by huge drop in oil prices which accounts a major part of economy. Before it could recover from these two events it also faced Gulf war in the middle east.
Lets filter the data for year 2007 and visualise it for analysis.
## # A tibble: 142 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 2007 43.828 31889923 974.5803
## 2 Albania Europe 2007 76.423 3600523 5937.0295
## 3 Algeria Africa 2007 72.301 33333216 6223.3675
## 4 Angola Africa 2007 42.731 12420476 4797.2313
## 5 Argentina Americas 2007 75.320 40301927 12779.3796
## 6 Australia Oceania 2007 81.235 20434176 34435.3674
## 7 Austria Europe 2007 79.829 8199783 36126.4927
## 8 Bahrain Asia 2007 75.635 708573 29796.0483
## 9 Bangladesh Asia 2007 64.062 150448339 1391.2538
## 10 Belgium Europe 2007 79.441 10392226 33692.6051
## # ... with 132 more rows
After the analysis we can clearly see that most countries from Europr have made progress in terms of gdpPercap and lifeExp both, whereas most of the countries from Africa are still having lower gdpPercap and lifeExp.
Populous countries including India and China have shown a huge increase in lifeExp but with minor improvement in gdpPercap.
## # A tibble: 12 x 6
## # Groups: year [12]
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Kuwait Asia 1952 55.565 160000 108382.35
## 2 Kuwait Asia 1957 58.033 212846 113523.13
## 3 Kuwait Asia 1962 60.470 358266 95458.11
## 4 Kuwait Asia 1967 64.624 575003 80894.88
## 5 Kuwait Asia 1972 67.712 841934 109347.87
## 6 Kuwait Asia 1977 69.343 1140357 59265.48
## 7 Kuwait Asia 1982 71.309 1497494 31354.04
## 8 Kuwait Asia 1987 74.174 1891487 28118.43
## 9 Kuwait Asia 1992 75.190 1418095 34932.92
## 10 Kuwait Asia 1997 76.156 1765345 40300.62
## 11 Kuwait Asia 2002 76.904 2111561 35110.11
## 12 Kuwait Asia 2007 77.588 2505559 47306.99
## # A tibble: 300 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Canada Americas 2007 80.653 33390141 36319.235
## 2 Canada Americas 2002 79.770 31902268 33328.965
## 3 Costa Rica Americas 2007 78.782 4133884 9645.061
## 4 Puerto Rico Americas 2007 78.746 3942491 19328.709
## 5 Canada Americas 1997 78.610 30305843 28954.926
## 6 Chile Americas 2007 78.553 16284741 13171.639
## 7 Cuba Americas 2007 78.273 11416987 8948.103
## 8 United States Americas 2007 78.242 301139947 42951.653
## 9 Costa Rica Americas 2002 78.123 3834934 7723.447
## 10 Canada Americas 1992 77.950 28523502 26342.884
## # ... with 290 more rows
## y
## x Africa Americas Asia Europe Oceania
## 1 624 300 374 360 24
## 2 0 0 22 0 0
## y
## x Africa Americas Asia Europe Oceania
## 1 52 25 31 30 2
## 2 0 0 2 0 0
## y
## x Africa Americas Asia Europe Oceania
## 1 52 25 31 30 2
## 2 0 0 2 0 0
| Africa | Americas | Asia | Europe | Oceania |
|---|---|---|---|---|
| 52 | 25 | 31 | 30 | 2 |
| 0 | 0 | 2 | 0 | 0 |